By the end of this tutorial, you will be able to:
LiDAR (Light Detection and Ranging) is an active remote sensing technology that measures distances by illuminating targets with laser pulses and analyzing the reflected light. Airborne Laser Scanning (ALS) systems mounted on aircraft or drones can collect millions of precise 3D points representing the Earth’s surface and features on it.
The lidR package is a comprehensive R toolkit for airborne LiDAR data manipulation and visualization. Created by Jean-Romain Roussel, it provides tools for:
Documentation: https://r-lidar.github.io/lidRbook/
You’ll work with Airborne Laser Scanning (ALS) data from Aarhus University campus in Denmark. The data is in LAZ format (compressed LAS), which is a standard format for storing LiDAR point clouds.
Data Source: Danish Agency for Data Supply and
Infrastructure
Coverage: University Park, Aarhus (1 km × 1 km
tile)
Free Download: https://dataforsyningen.dk/data/3931
Question 1: What is the difference between LAS and LAZ file formats? LAZ is the compressed form of LAS, which is uncompressed. —
The readLAS() function loads LiDAR data into R as a LAS
object.
LAS objects have two main components: - @header: Metadata about the point cloud (extent, CRS, point counts, etc.) - @data: The actual point data (X, Y, Z coordinates and attributes)
## File signature: LASF
## File source ID: 0
## Global encoding:
## - GPS Time Type: GPS Week Time
## - Synthetic Return Numbers: no
## - Well Know Text: CRS is GeoTIFF
## - Aggregate Model: false
## Project ID - GUID: 00000000-0000-0000-0000-000000000000
## Version: 1.4
## System identifier:
## Generating software:
## File creation d/y: 0/0
## header size: 375
## Offset to point data: 1005
## Num. var. length record: 1
## Point data format: 7
## Point data record length: 48
## Num. of point records: 1934412
## Num. of points by return: 1934412 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Scale factor X Y Z: 0.01 0.01 0.01
## Offset X Y Z: 0 0 0
## min X Y Z: 574448.3 6225346 -1.37
## max X Y Z: 574930.6 6225733 92.88
## Variable Length Records (VLR):
## Variable Length Record 1 of 1
## Description: by LAStools of rapidlasso GmbH
## Extra Bytes Description:
## Blue: additional attributes
## Green: additional attributes
## Red: additional attributes
## Extended Variable Length Records (EVLR): void
## X Y Z gpstime Intensity ReturnNumber
## <num> <num> <num> <num> <int> <int>
## 1: 574642.9 6225346 37.91 111077394 44 1
## 2: 574643.5 6225346 37.86 111077394 25 1
## 3: 574644.1 6225346 37.75 111077394 39 1
## 4: 574645.4 6225346 37.45 111077394 31 1
## 5: 574645.9 6225346 37.45 111077394 21 1
## ---
## 1934408: 574473.3 6225504 44.59 111078674 136 1
## 1934409: 574472.8 6225504 44.59 111078674 143 1
## 1934410: 574472.1 6225504 44.61 111078674 138 1
## 1934411: 574472.9 6225504 44.56 111078674 152 1
## 1934412: 574472.3 6225504 44.58 111078674 123 1
## NumberOfReturns ScanDirectionFlag EdgeOfFlightline Classification
## <int> <int> <int> <int>
## 1: 1 1 0 2
## 2: 1 1 0 2
## 3: 1 1 0 2
## 4: 1 1 0 2
## 5: 1 1 0 2
## ---
## 1934408: 1 1 0 2
## 1934409: 1 1 0 2
## 1934410: 1 1 1 2
## 1934411: 1 1 0 2
## 1934412: 1 1 1 2
## ScannerChannel Synthetic_flag Keypoint_flag Withheld_flag Overlap_flag
## <int> <lgcl> <lgcl> <lgcl> <lgcl>
## 1: 0 FALSE FALSE FALSE FALSE
## 2: 0 FALSE FALSE FALSE FALSE
## 3: 0 FALSE FALSE FALSE FALSE
## 4: 0 FALSE FALSE FALSE FALSE
## 5: 0 FALSE FALSE FALSE FALSE
## ---
## 1934408: 0 FALSE FALSE FALSE FALSE
## 1934409: 0 FALSE FALSE FALSE FALSE
## 1934410: 0 FALSE FALSE FALSE FALSE
## 1934411: 0 FALSE FALSE FALSE FALSE
## 1934412: 0 FALSE FALSE FALSE FALSE
## ScanAngle UserData PointSourceID R G B Blue Green Red
## <num> <int> <int> <int> <int> <int> <num> <num> <num>
## 1: 0 2 45063 38400 35584 34560 34560 35584 38400
## 2: 0 2 45063 24320 21504 20480 20480 21504 24320
## 3: 0 2 45063 17664 14848 13824 13824 14848 17664
## 4: 0 2 45063 44288 43264 40960 40960 43264 44288
## 5: 0 2 45063 45056 43776 42240 42240 43776 45056
## ---
## 1934408: 0 2 45064 15360 15872 13056 13056 15872 15360
## 1934409: 0 2 45064 15616 16128 13312 13312 16128 15616
## 1934410: 0 2 45064 17664 18176 15360 15360 18176 17664
## 1934411: 0 2 45064 16896 16128 14336 14336 16128 16896
## 1934412: 0 2 45064 16384 16384 13312 13312 16384 16384
## class : LAS (v1.4 format 7)
## memory : 236.1 Mb
## extent : 574448.3, 574930.6, 6225346, 6225733 (xmin, xmax, ymin, ymax)
## coord. ref. : NA
## area : 185488 units²
## points : 1.93 million points
## type : airborne
## density : 10.43 points/units²
## density : 10.43 pulses/units²
## File signature: LASF
## File source ID: 0
## Global encoding:
## - GPS Time Type: GPS Week Time
## - Synthetic Return Numbers: no
## - Well Know Text: CRS is GeoTIFF
## - Aggregate Model: false
## Project ID - GUID: 00000000-0000-0000-0000-000000000000
## Version: 1.4
## System identifier:
## Generating software:
## File creation d/y: 0/0
## header size: 375
## Offset to point data: 1005
## Num. var. length record: 1
## Point data format: 7
## Point data record length: 48
## Num. of point records: 1934412
## Num. of points by return: 1934412 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Scale factor X Y Z: 0.01 0.01 0.01
## Offset X Y Z: 0 0 0
## min X Y Z: 574448.3 6225346 -1.37
## max X Y Z: 574930.6 6225733 92.88
## Variable Length Records (VLR):
## Variable Length Record 1 of 1
## Description: by LAStools of rapidlasso GmbH
## Extra Bytes Description:
## Blue: additional attributes
## Green: additional attributes
## Red: additional attributes
## Extended Variable Length Records (EVLR): void
Question 2: What information is stored in the @header vs @data? The data contains the actual point data, so what values are found for each point, the values here contain information about the location, time, intensity, the colour values for red, green and blue. And many other attributes. The header contains general information about the dataset, like how many points were recorded, the location range, the CRS and so on.
## [1] 1934412
## [1] 1934412
## [1] 1934412
Question 3: How many points does this LiDAR dataset contain? [1] 1934412
The Z coordinate represents elevation. Let’s explore its distribution.
# Basic histogram using base R
hist(Unipark$Z,
xlab = "Elevation",
main = "Distribution of Elevation Values",
col = "skyblue",
border = "white"
)# Create a data frame for ggplot
uni_df <- data.frame(
x = Unipark$X,
y = Unipark$Y,
z = Unipark$Z
)
# Advanced histogram using ggplot2
ggplot(data = uni_df, aes(x = z)) +
geom_histogram(fill = "steelblue",
color = "white",
bins = 50) +
labs(
title = "Distribution of Z Values",
x = "Elevation (m)",
y = "Count"
) +
theme_minimal()## [1] -1.37 92.88
Question 4: What is the range of Z values in the
dataset? (Check with range(Unipark$Z)) [1] -1.37 92.88
Question 5: Can we directly use these Z values to
determine vegetation height? Why or why not? Vegetation can only be
-1.37 if 0 is the ocean level. Thereby the points need to be calibrated
to the real ground level to derive the vegetation height. —
Note: The plot() function creates interactive 3D
visualizations of point clouds. However, this only works if you run R
and RStudio locally on your computer, since Posit Cloud does not have
the graphics engine to display these 3D plots. Alternatively, you can
plot 2D transects through the point cloud with ggplot2.
Tip: Use your mouse to rotate, zoom, and pan the 3D view!
Alternative plot of a point cloud transect with ggplot:
# Create a LiDAR transect
p1 <- c(574500, 6225500) # define starting point of transect
p2 <- c(574700, 6225500) # define ending point of transect
las_tr <- clip_transect(Unipark, p1, p2, width = 20, xz = TRUE)
# plot transect with ggplot with x on the x-axis, elevation on the y-axis and colored by elevation
ggplot(payload(las_tr), aes(x = X, y = Z, color = Z)) +
geom_point(size = 0.1) +
coord_equal() +
theme_minimal() +
scale_color_viridis_c()Many LiDAR datasets include RGB color information captured during flight.
# Colorize by RGB values
# plot(Unipark, color = "R, G, B") # use for 3D plots on your local computerAlternative plot of a point cloud transect with ggplot:
# Create a smaller LiDAR transect
p1 <- c(574500, 6225500)
p2 <- c(574700, 6225500)
las_tr2 <- clip_transect(Unipark, p1, p2, width = 5, xz = TRUE)
# plot transect with ggplot and RGB colors
color_plot <- rgb(las_tr2$R/65535, las_tr2$G/65535, las_tr2$B/65535) # generate RGB colors; divide by max 16-bit value to scale from 0-1
ggplot(payload(las_tr2), aes(x = X,y = Z, color = color_plot)) +
geom_point(size = 0.3) +
coord_equal() +
theme_minimal() +
theme(legend.position = "none") +
scale_color_manual(values = color_plot)LiDAR points are classified according to ASPRS (American Society for Photogrammetry & Remote Sensing) standards.
| Code | Class | Description |
|---|---|---|
| 1 | Unclassified | Never classified |
| 2 | Ground | Bare earth and terrain |
| 3 | Low Vegetation | < 0.5 m height |
| 4 | Medium Vegetation | 0.5 - 2 m height |
| 5 | High Vegetation | > 2 m height |
| 6 | Building | Structures |
| 7 | Low Point (Noise) | Outliers |
| 9 | Water | Water surfaces |
| 18 | High Noise | Aerial noise |
Reference: ASPRS LAS 1.4 Specification
# Colorize by classification
# plot(Unipark, # use for 3D plots on your local computer
# color = Classification,
# colorPalette = c("yellow", "lightgreen", "green",
# "darkgreen", "grey", "darkgrey",
# "blue", "lightblue", "white")
# ) Alternative plot of a point cloud transect with ggplot:
# plot transect with ggplot and Classification colors
class <- factor(las_tr2$Classification) # store classification as discrete factors so that discrete colors can be assigned
ggplot(payload(las_tr2), aes(x = X,y = Z, color = class) ) +
geom_point(size = 0.3) +
coord_equal() +
theme_minimal() +
scale_color_manual(values = c("grey", "yellow", "lightgreen", "green",
"darkgreen", "darkgrey","white","lightblue" ))Question 6: Which colors represent vegetation vs. buildings in the plot?
## [1] 2 5 4 3 6 7 1 19 20 18 9
##
## 1 2 3 4 5 6 7 9 18 19
## 30430 1037484 11481 27855 603276 209873 2224 5246 2022 4386
## 20
## 135
The filter_poi() function (Points Of Interest) filters
point clouds based on conditions.
# Filter ground points (Class 2)
ground_points <- filter_poi(Unipark, Classification == 2)
subset_i <- seq(0,length(ground_points$X),10) # use every 10th point
ggplot(payload(ground_points[subset_i]), aes(X,Y, color = Z )) +
geom_point(size = 0.1) +
coord_equal() +
theme_minimal() +
scale_color_viridis_c()# Filter low vegetation (Class 3)
low_veg <- filter_poi(Unipark, Classification == 3)
ggplot(payload(low_veg), aes(X,Y, color = Z )) +
geom_point(size = 0.1) +
coord_equal() +
theme_minimal() +
scale_color_viridis_c()# Filter medium vegetation (Class 4)
med_veg <- filter_poi(Unipark, Classification == 4)
ggplot(payload(med_veg), aes(X,Y, color = Z )) +
geom_point(size = 0.1) +
coord_equal() +
theme_minimal() +
scale_color_viridis_c()# Filter high vegetation (Class 5)
high_veg <- filter_poi(Unipark, Classification == 5)
subset_i <- seq(0,length(high_veg$X),5) # use every 10th point
ggplot(payload(high_veg[subset_i]), aes(X,Y, color = Z )) +
geom_point(size = 0.1) +
coord_equal() +
theme_minimal() +
scale_color_viridis_c()# Filter buildings (Class 6)
buildings <- filter_poi(Unipark, Classification == 6)
ggplot(payload(buildings), aes(X,Y, color = Z )) +
geom_point(size = 0.1) +
coord_equal() +
theme_minimal() +
scale_color_viridis_c()# Filter water (Class 9)
water <- filter_poi(Unipark, Classification == 9)
ggplot(payload(water), aes(X,Y, color = Z )) +
geom_point(size = 0.1) +
coord_equal() +
theme_minimal() +
scale_color_viridis_c()## [1] 1037484
Question 7: How many points are classified as
ground? (Use nrow(ground_points)) [1] 1037484
Question 8: Which classification contains the most
points? The class 2 which contains all ground Points. —
Raw Z values represent elevation above sea level, not height above ground. To measure vegetation height, we need to normalize the point cloud by subtracting the ground elevation.
Vegetation Height = Z value - Ground Elevation
A DTM represents the bare earth surface. We’ll use a Triangulated Irregular Network (TIN) algorithm.
# Generate DTM using TIN algorithm
dtm <- rasterize_terrain(Unipark,
algorithm = tin(),
use_class = c(2L, 9L), # use ground and water points to interpolate terrain
res = 1) # 1-meter resolution
# Plot the DTM
plot(dtm,
main = "Digital Terrain Model (DTM)",
col = terrain.colors(50)
)Question 9: What does the DTM represent? It models the earth’s terrestrial surface. By using the ground and water points the lowest possible points of the las are taken for the model.
The normalize_height() function subtracts ground
elevation from each point’s Z value.
# Normalize heights using the DTM
norm_height <- normalize_height(Unipark, dtm)
par(mfrow = c(1,2))
# Compare original vs. normalized heights
hist(Unipark$Z,
main = "Original Z Values",
xlab = "Elevation (m)",
col = "coral"
)
hist(norm_height$Z,
main = "Normalized Heights",
xlab = "Height above ground (m)",
col = "forestgreen"
)## [1] -41.41 48.59
Question 10: What is the range of normalized heights? How does this compare to the original Z values? [1] -41.41 48.59 - normalized [1] -1.37 92.88 - original
The range is the same but it shifted towards 0 and the distribution changed. Ofcourse this happend because the height of the surface was substracted. Thereby the Z values for each points are now relative to each other.
Now let’s extract only vegetation points (classes 3, 4, 5) with meaningful heights.
# Filter vegetation: classes 3, 4, 5, and height > 0.5m
vegetation <- filter_poi(
norm_height,
Classification %in% c(3L, 4L, 5L) & Z >= 0.5
)
# Visualize vegetation only
# Create a LiDAR transect
p1 <- c(574500, 6225500)
p2 <- c(574700, 6225500)
las_tr_veg <- clip_transect(vegetation, p1, p2, width = 20, xz = TRUE)
# plot transect with ggplot
ggplot(payload(las_tr_veg), aes(X,Z, color = Z)) +
geom_point(size = 0.1) +
coord_equal() +
theme_minimal() +
scale_color_viridis_c()## [1] 627587
Question 11: How many vegetation points remain after filtering? [1] 627587
A CHM represents the height of the vegetation canopy above ground.
par(mfrow = c(1,1))
# Calculate CHM using 95th percentile of heights in each pixel
canopy_height <- rasterize_canopy(
vegetation,
func = quantile(vegetation$Z, probs = 0.95), # 95th percentile
res = 1 # 1-meter resolution
)
# Plot the CHM
plot(canopy_height,
main = "Canopy Height Model (CHM)",
col = colorRampPalette(c("yellow", "green", "darkgreen"))(50)
)Question 12: Why use the 95th percentile instead of the maximum height? To avoid including outliers that dont count to the actual crown.
Let’s identify areas with trees taller than 20 meters.
# Create a mask for trees > 20m
tall_tree_mask <- canopy_height
tall_tree_mask[tall_tree_mask < 20] <- NA
# Apply mask to CHM
tall_trees <- mask(canopy_height, tall_tree_mask) # use function from raster package
# Plot only tall trees
plot(tall_trees,
main = "Trees Taller than 20 meters",
col = colorRampPalette(c("orange", "red", "darkred"))(50)
)Question 13: Where are the tallest trees located in the park? (Describe geographically: north/south/east/west) The tallest trees are located in the south of the park. But aslso the west cntains more tall trees than the east. Nevertheless the south shows the strongest gradient for tall trees.
Vegetation structure can be characterized using various height percentiles and summary statistics.
# Calculate mean height in 1m grid cells
h_mean <- rasterize_canopy(
vegetation,
func = ~mean(Z),
res = 1
)
plot(h_mean,
main = "Mean Vegetation Height",
col = terrain.colors(50)
)Percentiles describe the distribution of heights in each grid cell.
par(mfrow = c(2,2))
# 25th percentile (1st quartile)
h_25p <- grid_metrics(
vegetation,
func = quantile(Z, probs = 0.25),
res = 1
)
plot(h_25p, main = "25th Percentile Height")
# 50th percentile (median)
h_50p <- grid_metrics(
vegetation,
func = quantile(Z, probs = 0.5),
res = 1
)
plot(h_50p, main = "Median Height")
# 75th percentile (3rd quartile)
h_75p <- grid_metrics(
vegetation,
func = quantile(Z, probs = 0.75),
res = 1
)
plot(h_75p, main = "75th Percentile Height")
# 95th percentile (used for CHM)
h_95p <- grid_metrics(
vegetation,
func = quantile(Z, probs = 0.95),
res = 1
)
plot(h_95p, main = "95th Percentile Height")Question 14: What does the 25th percentile height tell us about vegetation structure? It shows the distribution of the lowest 25% height in the canopy strcuture. This way exceptionally small patches could be found.
Question 15: Which percentile (25th, 50th, 75th, or 95th) best represents the dominant canopy height? As 95% includes the most but excludes high outliers i would use that percentile. —
The pulse penetration ratio measures how many laser pulses reached the ground, indicating canopy openness.
Formula: Pulse Penetration Ratio = (Ground Points) / (Total Points)
par(mfrow = c(1,1))
# Define function to calculate ratio
calc_ratio <- function(z_ground, z_total) {
ratio <- z_ground / z_total
return(ratio)
}
# Calculate at 1m resolution
pulse_pen_1m <- grid_metrics(
norm_height,
func = calc_ratio(
length(Z[Classification == 2]), # Ground points
length(Z) # Total points
),
res = 1
)
plot(pulse_pen_1m,
main = "Pulse Penetration Ratio (1m)",
col = colorRampPalette(c("darkgreen", "yellow"))(50)
)Question 16: What do high vs. low pulse penetration values indicate? High pulse penetration values mean that the vegetation is sparse because a lot of lasers hit the ground while a low ratio means that the vegetation is dense. Or at least not a lot of Lasers hit the ground in this case that is also true for all the builduings.
Count points in different height layers to understand vertical structure.
par(mfrow = c(2,2))
# 0-1 meter layer (understory)
layer_0_1m <- grid_metrics(
norm_height,
func = sum(Z >= 0 & Z <= 1),
res = 1
)
plot(layer_0_1m, main = "Point Density: 0-1m")
# 1-5 meter layer (shrub layer)
layer_1_5m <- grid_metrics(
norm_height,
func = sum(Z >= 1 & Z <= 5),
res = 1
)
plot(layer_1_5m, main = "Point Density: 1-5m")
# 5-10 meter layer (sub-canopy)
layer_5_10m <- grid_metrics(
norm_height,
func = sum(Z >= 5 & Z <= 10),
res = 1
)
plot(layer_5_10m, main = "Point Density: 5-10m")
# 10-50 meter layer (canopy)
layer_10_50m <- grid_metrics(
norm_height,
func = sum(Z >= 10 & Z <= 50),
res = 1
)
plot(layer_10_50m, main = "Point Density: 10-50m")Question 17: Which height layer has the most points? What does this tell you about the vegetation structure? The 0-1m layer got the most points that means the most vegetation found in coverage seems to be in the herb and shrub layer. So a shrub and herb/grass dominated vegetation is present.
Calculating metrics at different resolutions reveals patterns at different spatial scales.
par(mfrow = c(1,3))
# Pulse penetration at 1m resolution (already calculated)
plot(pulse_pen_1m, main = "Pulse Penetration: 1m Resolution")
# Pulse penetration at 5m resolution
pulse_pen_5m <- grid_metrics(
norm_height,
func = calc_ratio(
length(Z[Classification == 2]),
length(Z)
),
res = 5 # 5-meter resolution
)
plot(pulse_pen_5m, main = "Pulse Penetration: 5m Resolution")
# Pulse penetration at 10m resolution
pulse_pen_10m <- grid_metrics(
norm_height,
func = calc_ratio(
length(Z[Classification == 2]),
length(Z)
),
res = 10 # 10-meter resolution
)
plot(pulse_pen_10m, main = "Pulse Penetration: 10m Resolution")Question 18: How does the pattern change as you increase the resolution from 1m to 10m? The higher values dominate the high resoltuions, while with lowering resolution the lower values start dominating.
Question 19: Which resolution is most appropriate for: - Detailed tree-level analysis? You can realy identify individual trees at the 1m resolution so i would pick that one. - Landscape-level patterns? Landscape level patterns are probably already detectable at 10m resolution eventhough why not use the higher resolution if possible?
You can calculate multiple metrics simultaneously using custom functions.
# Define a function to calculate multiple metrics
#my_metrics <- function(z) {
# list(
# mean_height = mean(z),
# max_height = max(z),
#sd_height = sd(z),
#cv_height = sd(z) / mean(z) * 100, # Coefficient of variation
#n_points = length(z)
#)
#}
# Calculate all metrics at once
#all_metrics <- grid_metrics(
# vegetation,
# func = ~my_metrics(z),
# res = 5
#)
# Plot each metric
#plot(all_metrics, "mean_height", main = "Mean Height (5m)")
#plot(all_metrics, "max_height", main = "Max Height (5m)")
#plot(all_metrics, "sd_height", main = "Height Std Dev (5m)")
#plot(all_metrics, "cv_height", main = "Height CV (5m)")